Faster Policy Adaptation in Environments with Exogeneity: A State Augmentation Approach
نویسندگان
چکیده
The reinforcement learning literature typically assumes fixed state transition functions for the sake of tractability. However, in many real-world tasks, the state transition function changes over time, and this change may be governed by exogenous variables outside of the control loop. This can make policy learning difficult. In this paper, we propose a new algorithm to address the aforementioned challenge by embedding the state transition functions at different timestamps into a Reproducing Kernel Hilbert Space; the exogenous variable, as the cause of the state transition evolution, is estimated by projecting the embeddings into the subspace that preserves maximum variance. By augmenting the observable state vector with the estimated exogenous variable, standard RL algorithms such as Q-learning are able to learn faster and better. Experiments with both synthetic and real data demonstrate the superiority of our proposed algorithm over standard and advanced variants of Q-learning algorithms in dynamic environments.
منابع مشابه
Multi-Agent Learning with Policy Prediction
Due to the non-stationary environment, learning in multi-agent systems is a challenging problem. This paper first introduces a new gradient-based learning algorithm, augmenting the basic gradient ascent approach with policy prediction. We prove that this augmentation results in a stronger notion of convergence than the basic gradient ascent, that is, strategies converge to a Nash equilibrium wi...
متن کاملPrinciples of the ‘Lingua Franca Approach’ and their implications for pedagogical practice in the Iranian context
AbstractThe last thirty five years have created a challenging situation for Iran and its people: on the one hand, the discriminatory British and American policies towards the country have given rise to considerable bitterness; on the other, we continue to teach both British and American English. If Iranian people wish to play a more active role internationally, it is time to review our English ...
متن کاملهویت و تأثیر آن بر سیاست خارجیِ یمن در دوران پسا اتحاد 1990
After the 1990 unification in Yemen, the country’s officials made efforts to adopt a new turn of behaviors in various areas, especially their foreign policy, in accordance to the created changes to prove themselves to their peripheral and international environments. Based thereupon, they did their best to define a new identity for the unified Yemen and identify their foreign policy and behavior...
متن کاملContextualizing Obesity and Diabetes Policy: Exploring a Nested Statistical and Constructivist Approach at the Cross-National and Subnational Government Level in the United States and Brazil
Background This article conducts a comparative national and subnational government analysis of the political, economic, and ideational constructivist contextual factors facilitating the adoption of obesity and diabetes policy. Methods We adopt a nested analytical approach to policy analysis, which combines cross-national statistical analysis with subnational case study comparisons to examine...
متن کاملComparing the Effect of Reamed Exchange Nailing and Augmentation Compression Plating in Treatment of Femoral Shaft Nonunion
Background and purpose: One of the most important complications of bone fractures is nonunion. This research aimed at comparing the effect of exchange nailing and plate augmentation on complications after surgery and the time to achieve desired outcomes. Materials and methods: This descriptive-analytical study was conducted in 12 femoral shaft nonunion cases treated operatively in 2011-2018 (...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017